{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introduction to Python\n",
    "Welcome to COMP9417!  Throughout this course, you will be introduced to a variety of machine learning algorithms.  These labs are intended to give you practical experience with setting up and running these algorithms on realistic data sets.\n",
    "\n",
    "There are a few concepts you should have some understanding of to complete this work:\n",
    "* <a href=\"https://www.tutorialspoint.com/python/python_basic_syntax.htm\">Basic Python</a>\n",
    "* <a href=\"https://onlinestatbook.com/2/regression/intro.html\">Linear Regression</a>\n",
    "\n",
    "##  Some practice for machine learning packages\n",
    "For the lab work in this course, we will be using the Python language, which you may or may not have used before. If not, it may help you to do some independent study to pick up the basics. However, this is not a Python course - the focus is on applying practical machine learning algorithms, and Python is simply a tool for us to do so.\n",
    "\n",
    "* <b>Numpy</b> is a popular Numerical Python data processing library. \n",
    "\n",
    "* <b>SciPy</b>  is an open-source software for scientific computing and covers the disciplines of mathematics, science and engineering. \n",
    "\n",
    "* <b>Pandas</b>  is a data storage and analysis library that primarily provides utilities to deal with structured records, normally stored as CSVs or tables.\n",
    "\n",
    "* <b>Scikit-Learn</b> is a Python library for high performance Machine Learning.\n",
    "\n",
    "* <b>Matplotlib</b> is a Python plotting library that allows you to make interactive plots.\n",
    "\n",
    "* <b>Seaborn</b>  is a Python data visualization library based on matplotlib, providing a high-level interface for drawing attractive and informative statistical graphics.\n",
    "\n",
    "* <b>TensorFlow</b>  is a deep learning library developed by Google."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Import all the packages that we will learn or practice\n",
    "Prior to starting, you should check whether you have installed the following packages like numpy, scipy, pandas, scikit-learn, matplotlib and seaborn.\n",
    "\n",
    "If you are using Linux, you can use the following code at the command-line in a terminal to install these packages:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "pip install numpy scipy pandas scikit-learn matplotlib seaborn"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you are using anaconda, you can use the following code at the Anaconda prompt to install these packages:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "scrolled": true
   },
   "source": [
    "conda install numpy scipy pandas scikit-learn matplotlib seaborn"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To run code in the Jupyter environment, you simply click in the code area and type ctrl+enter"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import scipy\n",
    "import pandas as pd\n",
    "import sklearn as sk\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## NumPy practice\n",
    "### basic"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "list object: \n",
      "\n",
      "[1, 2, 3, 4]\n",
      "\n",
      "array object: \n",
      "\n",
      "[1. 2. 3. 4.]\n",
      "\n",
      "zero matix: \n",
      "\n",
      "[[0. 0. 0.]\n",
      " [0. 0. 0.]\n",
      " [0. 0. 0.]\n",
      " [0. 0. 0.]]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "list_object=[1,2,3,4]                          # Creating a list object\n",
    "array=np.array(list_object)                    # Converting the original list object into numpy array object\n",
    "array=np.array(list_object,dtype=np.float32)   # Specifying data type\n",
    "zeros=np.zeros((4,3))                          # Creating matrix with 4*3 zeros\n",
    "print(\"list object: \\n\")\n",
    "print(list_object)\n",
    "print()\n",
    "print(\"array object: \\n\")\n",
    "print(array)\n",
    "print()\n",
    "print(\"zero matix: \\n\")\n",
    "print(zeros)\n",
    "print()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Operators\n",
    "Arithmetic operators"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "z = x + y:  [6 9]\n",
      "z = x * y:  [ 8 18]\n",
      "z = x / y:  [2. 2.]\n"
     ]
    }
   ],
   "source": [
    "x=np.array([4,6])\n",
    "y=np.array([2,3])\n",
    "z = x + y                                # x and y are numpy array with the same size\n",
    "print(\"z = x + y: \",z)\n",
    "z = x * y\n",
    "print(\"z = x * y: \",z)\n",
    "z = x / y\n",
    "print(\"z = x / y: \",z)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Comparison operators"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "x:  [4 6]\n",
      "y:  [2 3]\n",
      "z = x > y: [ True  True]\n",
      "z = x > 5: [False  True]\n"
     ]
    }
   ],
   "source": [
    "print(\"x: \", x)\n",
    "print(\"y: \", y)\n",
    "z = x > y\n",
    "print(\"z = x > y:\", z)\n",
    "z = x > 5\n",
    "print(\"z = x > 5:\", z)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Unary operators"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "matrix A: \n",
      " [[0 1 2]\n",
      " [3 4 5]\n",
      " [6 7 8]]\n",
      "mean: 4.0\n",
      "col_sum [ 9 12 15]\n",
      "row_sum [ 3 12 21]\n"
     ]
    }
   ],
   "source": [
    "A = np.arange(9).reshape((3,3)) \n",
    "print(\"matrix A: \\n\",A)\n",
    "sum_a = np.mean(A)\n",
    "print(\"mean:\", sum_a)\n",
    "col_sum = A.sum(axis = 0)               # calculates sum of each column\n",
    "print(\"col_sum\", col_sum)\n",
    "row_sum = A.sum(axis = 1)               # calculates sum of each row\n",
    "print(\"row_sum\", row_sum)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data Exploration Using Pandas\n",
    "loading the diabetes data set and save the data set into csv file, following by loading the csv file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn import datasets\n",
    "\n",
    "# Load the diabetes dataset\n",
    "diabetes = datasets.load_diabetes()\n",
    "\n",
    "# Putting the dataset into pandas DataFrame\n",
    "data = pd.DataFrame(diabetes.data, columns=diabetes.feature_names)\n",
    "target = pd.DataFrame(diabetes.target, columns=[\"target\"])\n",
    "\n",
    "# Combining the two dataframes into one\n",
    "df = pd.concat([data,target], axis=1)\n",
    "\n",
    "# Saving the data frame into \"diabetes.csv\" file \n",
    "df.to_csv(\"diabetes.csv\", index=False) \n",
    "\n",
    "#Loading the data from csv file\n",
    "csv_df = pd.read_csv(\"diabetes.csv\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Analyzing DataFrames"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>sex</th>\n",
       "      <th>bmi</th>\n",
       "      <th>bp</th>\n",
       "      <th>s1</th>\n",
       "      <th>s2</th>\n",
       "      <th>s3</th>\n",
       "      <th>s4</th>\n",
       "      <th>s5</th>\n",
       "      <th>s6</th>\n",
       "      <th>target</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.038076</td>\n",
       "      <td>0.050680</td>\n",
       "      <td>0.061696</td>\n",
       "      <td>0.021872</td>\n",
       "      <td>-0.044223</td>\n",
       "      <td>-0.034821</td>\n",
       "      <td>-0.043401</td>\n",
       "      <td>-0.002592</td>\n",
       "      <td>0.019908</td>\n",
       "      <td>-0.017646</td>\n",
       "      <td>151.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>-0.001882</td>\n",
       "      <td>-0.044642</td>\n",
       "      <td>-0.051474</td>\n",
       "      <td>-0.026328</td>\n",
       "      <td>-0.008449</td>\n",
       "      <td>-0.019163</td>\n",
       "      <td>0.074412</td>\n",
       "      <td>-0.039493</td>\n",
       "      <td>-0.068330</td>\n",
       "      <td>-0.092204</td>\n",
       "      <td>75.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.085299</td>\n",
       "      <td>0.050680</td>\n",
       "      <td>0.044451</td>\n",
       "      <td>-0.005671</td>\n",
       "      <td>-0.045599</td>\n",
       "      <td>-0.034194</td>\n",
       "      <td>-0.032356</td>\n",
       "      <td>-0.002592</td>\n",
       "      <td>0.002864</td>\n",
       "      <td>-0.025930</td>\n",
       "      <td>141.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>-0.089063</td>\n",
       "      <td>-0.044642</td>\n",
       "      <td>-0.011595</td>\n",
       "      <td>-0.036656</td>\n",
       "      <td>0.012191</td>\n",
       "      <td>0.024991</td>\n",
       "      <td>-0.036038</td>\n",
       "      <td>0.034309</td>\n",
       "      <td>0.022692</td>\n",
       "      <td>-0.009362</td>\n",
       "      <td>206.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.005383</td>\n",
       "      <td>-0.044642</td>\n",
       "      <td>-0.036385</td>\n",
       "      <td>0.021872</td>\n",
       "      <td>0.003935</td>\n",
       "      <td>0.015596</td>\n",
       "      <td>0.008142</td>\n",
       "      <td>-0.002592</td>\n",
       "      <td>-0.031991</td>\n",
       "      <td>-0.046641</td>\n",
       "      <td>135.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        age       sex       bmi        bp        s1        s2        s3  \\\n",
       "0  0.038076  0.050680  0.061696  0.021872 -0.044223 -0.034821 -0.043401   \n",
       "1 -0.001882 -0.044642 -0.051474 -0.026328 -0.008449 -0.019163  0.074412   \n",
       "2  0.085299  0.050680  0.044451 -0.005671 -0.045599 -0.034194 -0.032356   \n",
       "3 -0.089063 -0.044642 -0.011595 -0.036656  0.012191  0.024991 -0.036038   \n",
       "4  0.005383 -0.044642 -0.036385  0.021872  0.003935  0.015596  0.008142   \n",
       "\n",
       "         s4        s5        s6  target  \n",
       "0 -0.002592  0.019908 -0.017646   151.0  \n",
       "1 -0.039493 -0.068330 -0.092204    75.0  \n",
       "2 -0.002592  0.002864 -0.025930   141.0  \n",
       "3  0.034309  0.022692 -0.009362   206.0  \n",
       "4 -0.002592 -0.031991 -0.046641   135.0  "
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# prints the first few rows of the dataframe\n",
    "csv_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 442 entries, 0 to 441\n",
      "Data columns (total 11 columns):\n",
      "age       442 non-null float64\n",
      "sex       442 non-null float64\n",
      "bmi       442 non-null float64\n",
      "bp        442 non-null float64\n",
      "s1        442 non-null float64\n",
      "s2        442 non-null float64\n",
      "s3        442 non-null float64\n",
      "s4        442 non-null float64\n",
      "s5        442 non-null float64\n",
      "s6        442 non-null float64\n",
      "target    442 non-null float64\n",
      "dtypes: float64(11)\n",
      "memory usage: 38.1 KB\n"
     ]
    }
   ],
   "source": [
    "# provides a concise summary of the dataframe\n",
    "csv_df.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>sex</th>\n",
       "      <th>bmi</th>\n",
       "      <th>bp</th>\n",
       "      <th>s1</th>\n",
       "      <th>s2</th>\n",
       "      <th>s3</th>\n",
       "      <th>s4</th>\n",
       "      <th>s5</th>\n",
       "      <th>s6</th>\n",
       "      <th>target</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>4.420000e+02</td>\n",
       "      <td>4.420000e+02</td>\n",
       "      <td>4.420000e+02</td>\n",
       "      <td>4.420000e+02</td>\n",
       "      <td>4.420000e+02</td>\n",
       "      <td>4.420000e+02</td>\n",
       "      <td>4.420000e+02</td>\n",
       "      <td>4.420000e+02</td>\n",
       "      <td>4.420000e+02</td>\n",
       "      <td>4.420000e+02</td>\n",
       "      <td>442.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>-3.639623e-16</td>\n",
       "      <td>1.269723e-16</td>\n",
       "      <td>-8.016463e-16</td>\n",
       "      <td>1.288562e-16</td>\n",
       "      <td>-8.992304e-17</td>\n",
       "      <td>1.296097e-16</td>\n",
       "      <td>-4.563971e-16</td>\n",
       "      <td>3.875733e-16</td>\n",
       "      <td>-3.845592e-16</td>\n",
       "      <td>-3.398488e-16</td>\n",
       "      <td>152.133484</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>4.761905e-02</td>\n",
       "      <td>4.761905e-02</td>\n",
       "      <td>4.761905e-02</td>\n",
       "      <td>4.761905e-02</td>\n",
       "      <td>4.761905e-02</td>\n",
       "      <td>4.761905e-02</td>\n",
       "      <td>4.761905e-02</td>\n",
       "      <td>4.761905e-02</td>\n",
       "      <td>4.761905e-02</td>\n",
       "      <td>4.761905e-02</td>\n",
       "      <td>77.093005</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>-1.072256e-01</td>\n",
       "      <td>-4.464164e-02</td>\n",
       "      <td>-9.027530e-02</td>\n",
       "      <td>-1.123996e-01</td>\n",
       "      <td>-1.267807e-01</td>\n",
       "      <td>-1.156131e-01</td>\n",
       "      <td>-1.023071e-01</td>\n",
       "      <td>-7.639450e-02</td>\n",
       "      <td>-1.260974e-01</td>\n",
       "      <td>-1.377672e-01</td>\n",
       "      <td>25.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>-3.729927e-02</td>\n",
       "      <td>-4.464164e-02</td>\n",
       "      <td>-3.422907e-02</td>\n",
       "      <td>-3.665645e-02</td>\n",
       "      <td>-3.424784e-02</td>\n",
       "      <td>-3.035840e-02</td>\n",
       "      <td>-3.511716e-02</td>\n",
       "      <td>-3.949338e-02</td>\n",
       "      <td>-3.324879e-02</td>\n",
       "      <td>-3.317903e-02</td>\n",
       "      <td>87.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>5.383060e-03</td>\n",
       "      <td>-4.464164e-02</td>\n",
       "      <td>-7.283766e-03</td>\n",
       "      <td>-5.670611e-03</td>\n",
       "      <td>-4.320866e-03</td>\n",
       "      <td>-3.819065e-03</td>\n",
       "      <td>-6.584468e-03</td>\n",
       "      <td>-2.592262e-03</td>\n",
       "      <td>-1.947634e-03</td>\n",
       "      <td>-1.077698e-03</td>\n",
       "      <td>140.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>3.807591e-02</td>\n",
       "      <td>5.068012e-02</td>\n",
       "      <td>3.124802e-02</td>\n",
       "      <td>3.564384e-02</td>\n",
       "      <td>2.835801e-02</td>\n",
       "      <td>2.984439e-02</td>\n",
       "      <td>2.931150e-02</td>\n",
       "      <td>3.430886e-02</td>\n",
       "      <td>3.243323e-02</td>\n",
       "      <td>2.791705e-02</td>\n",
       "      <td>211.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>1.107267e-01</td>\n",
       "      <td>5.068012e-02</td>\n",
       "      <td>1.705552e-01</td>\n",
       "      <td>1.320442e-01</td>\n",
       "      <td>1.539137e-01</td>\n",
       "      <td>1.987880e-01</td>\n",
       "      <td>1.811791e-01</td>\n",
       "      <td>1.852344e-01</td>\n",
       "      <td>1.335990e-01</td>\n",
       "      <td>1.356118e-01</td>\n",
       "      <td>346.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                age           sex           bmi            bp            s1  \\\n",
       "count  4.420000e+02  4.420000e+02  4.420000e+02  4.420000e+02  4.420000e+02   \n",
       "mean  -3.639623e-16  1.269723e-16 -8.016463e-16  1.288562e-16 -8.992304e-17   \n",
       "std    4.761905e-02  4.761905e-02  4.761905e-02  4.761905e-02  4.761905e-02   \n",
       "min   -1.072256e-01 -4.464164e-02 -9.027530e-02 -1.123996e-01 -1.267807e-01   \n",
       "25%   -3.729927e-02 -4.464164e-02 -3.422907e-02 -3.665645e-02 -3.424784e-02   \n",
       "50%    5.383060e-03 -4.464164e-02 -7.283766e-03 -5.670611e-03 -4.320866e-03   \n",
       "75%    3.807591e-02  5.068012e-02  3.124802e-02  3.564384e-02  2.835801e-02   \n",
       "max    1.107267e-01  5.068012e-02  1.705552e-01  1.320442e-01  1.539137e-01   \n",
       "\n",
       "                 s2            s3            s4            s5            s6  \\\n",
       "count  4.420000e+02  4.420000e+02  4.420000e+02  4.420000e+02  4.420000e+02   \n",
       "mean   1.296097e-16 -4.563971e-16  3.875733e-16 -3.845592e-16 -3.398488e-16   \n",
       "std    4.761905e-02  4.761905e-02  4.761905e-02  4.761905e-02  4.761905e-02   \n",
       "min   -1.156131e-01 -1.023071e-01 -7.639450e-02 -1.260974e-01 -1.377672e-01   \n",
       "25%   -3.035840e-02 -3.511716e-02 -3.949338e-02 -3.324879e-02 -3.317903e-02   \n",
       "50%   -3.819065e-03 -6.584468e-03 -2.592262e-03 -1.947634e-03 -1.077698e-03   \n",
       "75%    2.984439e-02  2.931150e-02  3.430886e-02  3.243323e-02  2.791705e-02   \n",
       "max    1.987880e-01  1.811791e-01  1.852344e-01  1.335990e-01  1.356118e-01   \n",
       "\n",
       "           target  \n",
       "count  442.000000  \n",
       "mean   152.133484  \n",
       "std     77.093005  \n",
       "min     25.000000  \n",
       "25%     87.000000  \n",
       "50%    140.500000  \n",
       "75%    211.500000  \n",
       "max    346.000000  "
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# provides descriptive statistics of central tendency, dispersion and shape\n",
    "csv_df.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Views and Slicing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>sex</th>\n",
       "      <th>bmi</th>\n",
       "      <th>bp</th>\n",
       "      <th>s1</th>\n",
       "      <th>s2</th>\n",
       "      <th>s3</th>\n",
       "      <th>s4</th>\n",
       "      <th>s5</th>\n",
       "      <th>s6</th>\n",
       "      <th>target</th>\n",
       "      <th>fat</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.038076</td>\n",
       "      <td>0.050680</td>\n",
       "      <td>0.061696</td>\n",
       "      <td>0.021872</td>\n",
       "      <td>-0.044223</td>\n",
       "      <td>-0.034821</td>\n",
       "      <td>-0.043401</td>\n",
       "      <td>-0.002592</td>\n",
       "      <td>0.019908</td>\n",
       "      <td>-0.017646</td>\n",
       "      <td>151.0</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.085299</td>\n",
       "      <td>0.050680</td>\n",
       "      <td>0.044451</td>\n",
       "      <td>-0.005671</td>\n",
       "      <td>-0.045599</td>\n",
       "      <td>-0.034194</td>\n",
       "      <td>-0.032356</td>\n",
       "      <td>-0.002592</td>\n",
       "      <td>0.002864</td>\n",
       "      <td>-0.025930</td>\n",
       "      <td>141.0</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>0.041708</td>\n",
       "      <td>0.050680</td>\n",
       "      <td>0.061696</td>\n",
       "      <td>-0.040099</td>\n",
       "      <td>-0.013953</td>\n",
       "      <td>0.006202</td>\n",
       "      <td>-0.028674</td>\n",
       "      <td>-0.002592</td>\n",
       "      <td>-0.014956</td>\n",
       "      <td>0.011349</td>\n",
       "      <td>110.0</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>-0.070900</td>\n",
       "      <td>-0.044642</td>\n",
       "      <td>0.039062</td>\n",
       "      <td>-0.033214</td>\n",
       "      <td>-0.012577</td>\n",
       "      <td>-0.034508</td>\n",
       "      <td>-0.024993</td>\n",
       "      <td>-0.002592</td>\n",
       "      <td>0.067736</td>\n",
       "      <td>-0.013504</td>\n",
       "      <td>310.0</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>0.027178</td>\n",
       "      <td>0.050680</td>\n",
       "      <td>0.017506</td>\n",
       "      <td>-0.033214</td>\n",
       "      <td>-0.007073</td>\n",
       "      <td>0.045972</td>\n",
       "      <td>-0.065491</td>\n",
       "      <td>0.071210</td>\n",
       "      <td>-0.096433</td>\n",
       "      <td>-0.059067</td>\n",
       "      <td>69.0</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>-0.005515</td>\n",
       "      <td>-0.044642</td>\n",
       "      <td>0.042296</td>\n",
       "      <td>0.049415</td>\n",
       "      <td>0.024574</td>\n",
       "      <td>-0.023861</td>\n",
       "      <td>0.074412</td>\n",
       "      <td>-0.039493</td>\n",
       "      <td>0.052280</td>\n",
       "      <td>0.027917</td>\n",
       "      <td>166.0</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>0.070769</td>\n",
       "      <td>0.050680</td>\n",
       "      <td>0.012117</td>\n",
       "      <td>0.056301</td>\n",
       "      <td>0.034206</td>\n",
       "      <td>0.049416</td>\n",
       "      <td>-0.039719</td>\n",
       "      <td>0.034309</td>\n",
       "      <td>0.027368</td>\n",
       "      <td>-0.001078</td>\n",
       "      <td>144.0</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>0.045341</td>\n",
       "      <td>0.050680</td>\n",
       "      <td>0.060618</td>\n",
       "      <td>0.031053</td>\n",
       "      <td>0.028702</td>\n",
       "      <td>-0.047347</td>\n",
       "      <td>-0.054446</td>\n",
       "      <td>0.071210</td>\n",
       "      <td>0.133599</td>\n",
       "      <td>0.135612</td>\n",
       "      <td>245.0</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>-0.063635</td>\n",
       "      <td>-0.044642</td>\n",
       "      <td>0.035829</td>\n",
       "      <td>-0.022885</td>\n",
       "      <td>-0.030464</td>\n",
       "      <td>-0.018850</td>\n",
       "      <td>-0.006584</td>\n",
       "      <td>-0.002592</td>\n",
       "      <td>-0.025952</td>\n",
       "      <td>-0.054925</td>\n",
       "      <td>184.0</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>-0.023677</td>\n",
       "      <td>-0.044642</td>\n",
       "      <td>0.059541</td>\n",
       "      <td>-0.040099</td>\n",
       "      <td>-0.042848</td>\n",
       "      <td>-0.043589</td>\n",
       "      <td>0.011824</td>\n",
       "      <td>-0.039493</td>\n",
       "      <td>-0.015998</td>\n",
       "      <td>0.040343</td>\n",
       "      <td>85.0</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         age       sex       bmi        bp        s1        s2        s3  \\\n",
       "0   0.038076  0.050680  0.061696  0.021872 -0.044223 -0.034821 -0.043401   \n",
       "2   0.085299  0.050680  0.044451 -0.005671 -0.045599 -0.034194 -0.032356   \n",
       "8   0.041708  0.050680  0.061696 -0.040099 -0.013953  0.006202 -0.028674   \n",
       "9  -0.070900 -0.044642  0.039062 -0.033214 -0.012577 -0.034508 -0.024993   \n",
       "11  0.027178  0.050680  0.017506 -0.033214 -0.007073  0.045972 -0.065491   \n",
       "16 -0.005515 -0.044642  0.042296  0.049415  0.024574 -0.023861  0.074412   \n",
       "17  0.070769  0.050680  0.012117  0.056301  0.034206  0.049416 -0.039719   \n",
       "23  0.045341  0.050680  0.060618  0.031053  0.028702 -0.047347 -0.054446   \n",
       "24 -0.063635 -0.044642  0.035829 -0.022885 -0.030464 -0.018850 -0.006584   \n",
       "27 -0.023677 -0.044642  0.059541 -0.040099 -0.042848 -0.043589  0.011824   \n",
       "\n",
       "          s4        s5        s6  target   fat  \n",
       "0  -0.002592  0.019908 -0.017646   151.0  True  \n",
       "2  -0.002592  0.002864 -0.025930   141.0  True  \n",
       "8  -0.002592 -0.014956  0.011349   110.0  True  \n",
       "9  -0.002592  0.067736 -0.013504   310.0  True  \n",
       "11  0.071210 -0.096433 -0.059067    69.0  True  \n",
       "16 -0.039493  0.052280  0.027917   166.0  True  \n",
       "17  0.034309  0.027368 -0.001078   144.0  True  \n",
       "23  0.071210  0.133599  0.135612   245.0  True  \n",
       "24 -0.002592 -0.025952 -0.054925   184.0  True  \n",
       "27 -0.039493 -0.015998  0.040343    85.0  True  "
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bmi = csv_df[\"bmi\"]                       # get the column of bmi\n",
    "\n",
    "csv_df[\"fat\"] = csv_df[\"bmi\"] > 0         # creates a new column with True / False values is bmi > 0\n",
    "\n",
    "csv_df[csv_df.fat]                        # selecting rows of entire dataframe where bmi > 0\n",
    "\n",
    "csv_df[csv_df.fat][:10]                   # selecting first 10 rows where bmi > 0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Matplotlib practice"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAD8CAYAAAB5Pm/hAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAIABJREFUeJzt3Xl0FFXax/HvZScoLxJAEUmiDrKOIERFcUYH0FEZBbcZMCCuEVDEBQXEcUdFUUAFFEFEiIqiDozD4IK7IhhWgcCAyqYoEHANypL7/lHdoZNUdTpLp9PVv885fZKurq66fQqevnnq3ucaay0iIuJf1WLdABERiS4FehERn1OgFxHxOQV6ERGfU6AXEfE5BXoREZ9ToBcR8TkFehERn1OgFxHxuRqxbgBAo0aNbFpaWqybISISV5YsWbLTWtu4pP2qRKBPS0sjOzs71s0QEYkrxphNkeyn1I2IiM8p0IuI+JwCvYiIzynQi4j4nAK9iIjPlRjojTHPGmO2G2NWhWxraIx52xizPvDzsMB2Y4x53BizwRiz0hjTMZqNFxGJV1lZkJYG1ao5P7OyoneuSHr0zwFnF9k2HFhgrW0BLAg8BzgHaBF4ZAKTKqaZIiL+kZUFmZmwaRNY6/zMzIxesC8x0FtrPwR2FdncE5ge+H060Ctk+/PW8RnQwBjTtKIaKyLiByNHQl5e4W15ec72aChrjv5wa+02gMDPJoHtzYAtIfttDWwTEUkI4VIywdc2eUxz2rw5Om2q6JuxxmWb6+rjxphMY0y2MSZ7x44dFdwMEZHocgvobimZfv3AGGjUCK68snCQr8G+QsdMSYlOW8sa6L8PpmQCP7cHtm8FmofsdxTwrdsBrLWTrbXp1tr0xo1LLNUgIlJleOXYhwwpnpKxga5ubi7s3Xtw+znMYx0t6cY7ACQlwahR0WlvWQP9XKB/4Pf+wJyQ7ZcFRt90Bn4MpnhERPzCK8eem1vye5uxlVe4mHn04Hdqk0cSqakweTJkZESnvSUWNTPGvAicATQyxmwF7gIeAl42xlwFbAYuCew+DzgX2ADkAVdEoc0iIjFVllx6dfYzmCe4lzupwX5G8ACPcgtHptZi48YKb2IhJQZ6a20fj5e6uexrgevK2ygRkaosJcX9hmpyMuzZU7y3fzKf8RQD6MAK3qAHg3mCjRwd1XRNKM2MFREppVGjnJx6qKQkGD/eScGkpjrbGrKLp8nkM04hmVwuqf4alzf8N5vM0VFP14RSoBcRKaWMjIMB3RgKBe2MDNj4tcU+N51vDm3JlTzLY9zMX5vn0Gv6BezMNeTnw8aNlRPkQYFeRKSYSMoTZGQ4wbpY0F6zBs44Ay6/nDrtWlBjxVJuto+yZvMhlRbYi1KgFxEJUebyBHl5MGIEtG8Pq1bBM8/Axx/D8cdXSrvDUaAXEQlRpvIEb7wBbdrAQw9B376wdi1cfbXzJ0EVUDVaISJSRXgNnXTdvnkzXHABnHceHHIIfPABTJsGVWwSqAK9iAgH8/LWtWhLkfIE+/bBmDHQujW89RaMHg3LlsGf/1wZTS01BXoRSSgl1ahxU2i8+yefQMeOcOut0L27c/P1ttugZs1K+gSlV+KEKRERvwgG9GAOPnijtW7d4nn5oNRUJ8hn/HUnXDUMnn0WmjeHf/0LevasvMaXgwK9iPheVpZzM9Wtx56X5x3kjYGNX+U7efdWw+DHH53e+513Qr160W10BVLqRkR8raS0TDhnHvGFk3e/+monH79smZOPLxLkK3NZwLJQoBcRX3MbLllUcnLhkgb1+IWxNW7lv9+f4AyVnDYNPvwQ2rUr9t7KXhawLBToRcTXSqo0WahGTYqlF/9iXfU23Lh/DNWuuBzWrYPLL3fyOC4qe1nAslCgFxFfC7dqU3KycyO2Xz+YNGwjHyefz+tcQLM2DZxZrVOmkDU/OWxaplTj7mNEgV5EfM2r0uTAgU5J4Z9y93KbfYi3vmlDg2XvsbTPI7BkCXTpEnZpwGDQ9/oiidaygGWhQC8ica2kG6FelSbnzYP0vA9YTgceYgTzOZvW5HDhp0MLxsS7pWWCE6qCufhzz3X/IqmMOvMRs9bG/NGpUycrIlJaM2dam5RkrRN+nUdSkrM9rO3b7TT6Wwv2a1JtD/5d8H5jDu5mTOFjuz1SU53zpaY6+wefVwYg20YQY431mu9bidLT0212dnasmyEicSYtzX3YZGoq7svz5efDlCkwfDh7d//CGIZyP3ewhyTX93odP5QxzmFjwRizxFqbXtJ+St2ISJXnlZ4p1Y3QFSvgtNPg2mvh+ON5c/QKRiU9UCjIF025uOX3i6pKuXgvCvQiUqWFG6ce0Y3Qn3+Gm2+GTp1gwwaYPh3ee4/zbmvtuUpUUGh+H4qPsKxyuXgvkeR3ov1Qjl5EvKSmhs+Ne+bo8/OtnT3b2mbNnBcyM63NzS1XW2KVi/dChDl69ehFpEoKpmu8cuSbN4dZu/WUr6BHD7j4YmjUCBYuhKefhoYNy9Umz+UDqzgFehGpciKpTxNMzxQKvut+J+Pr+6FtW/joIxg7FrKzoXPnSml3VaXqlSJS5ZRUn8Y1N/7ee84sqHXrnJ78uHHQrFlU2xkv1KMXkZhyG1ETrnxAsZum33/vTFft2tVZ+WnePHjlFQX5EAr0IhIzXiNqvFLpwTHuGRnAgQMwaRK0bAmzZsEdd8CqVXDOOQXHrsqlgyuTUjciEjNelR/r1nXSM6GvFUrXLF0KAwbA5587PfmJE52AH+C1khTEzw3UiqQevYjEjFeKZtcuj9E05/0EQ4bAiSc60XvmTHjnnUJBHuKjdHBlUqAXkZgJN+Gp0Giary0ZNV+GVq3giSec3vy6dc5OIbOYIhmSmYjKFeiNMTcZY1YbY1YZY140xtQxxhxtjFlkjFlvjJlljKlVUY0VEX/xKiFcaETNhg1w9tnwj39A06awaBFMmAANGhR6X2mGZCaaMgd6Y0wz4AYg3VrbDqgO9AZGA2OttS2A3cBVFdFQEYlvoTdHGzVyHv36Ofn45GSnYx66EEjL1N9YedE9zvJ9CxfC44/D4sVO2sZFmYZkJojypm5qAHWNMTWAJGAb0BWYHXh9OtCrnOcQkThXdHRNbq7zCP6+Z4+Tjdmzx3ne1b7Dvzcfz/Gv3c3GE3o567YOHgzVq5e6wBm417FJJGUedWOt/cYYMwbYDOwB3gKWAD9Ya/cHdtsKaDCrSIIrqbedl+cE4sYHtjGFm+nDS6znD5zFm/xv21lsPNLZL9xompSUUpYsTiDlSd0cBvQEjgaOBOoB57js6lrw3hiTaYzJNsZk79ixo6zNEJE4UNJN0GocYMCBJ1lLKy7kNe7mLv7IF7zNWYXeG240TUT5/gRVntRNd+Bra+0Oa+0+4DXgVKBBIJUDcBTwrdubrbWTrbXp1tr0xo0bl6MZIlLVhbsJ2olsFnEyTzKYRZxMO1ZxD3fzO3WKvTdc/XnPAmcJmq4JVZ5AvxnobIxJMsYYoBuwBngPuDiwT39gTvmaKCLxxC2H7tbbrs+PPMH1LOYkjuIbpnZ/kQvqvskGWhTsU7RHXlL9+XitLhl1kdQy9noA9wBrgVXADKA2cAywGNgAvALULuk4qkcvEt+CddqDa66G1ocPPk9Odh6GfJt5SJb9zhxu91PNPnvoYDtr8g+FjuNV773Ma8T6FFozVkQqQ9EbpOEcX+d/zD92EE1XL4D0dHjqKWflp1Keb+RIJ12TkuL0+BO15x7pmrEK9CJSLpEsoF2HPYzgQYYxmt9NHeo/+aCzdmv16pXSRr/S4uAiUialrfpY0oias3iTL/gjd3Ifs7mYlnYdDBqkIF+JFOhFpEC4hbi9eN0gPZJvmMXfeZOzOUB1uvEOfcmiduoR0Wm8eFKgF5ECZan6WHRETXX2M4Tx5NCa85nLHdzH8azkXbppXHuMKNCLSIFw49S9hI5fP5lFrKh5IuO4kfp/PZU3H13NzNQ72Gdqa1x7DGnhEREp4FVGoKSqjxnn7ibj49vh6aehcVMY9zJcfDE9jaHnzdFpq0ROPXoRKVDqMgLWwowZTp34yZOdRUFycuCSSwrViZfYUqAXSSAljagpVRmBnBxnGb/LLoOjj4YlS2DsWKhfvxI+iZSGUjciCSLSdVQzMkrIo+flOV38Rx6BevWcSU/XXON8e0iVpCsjkiAqZB3VefOchUAeeAD69HGW87v2WgX5Kk5XRyRBlGVETYEtW+Cii6BHD9ZvrcNfeI+0D6aT9XaTCm2jRIcCvUiCKKnyo6t9++DRR6F1a/b/ex531XyAtvuW8z5nRDSZSqoGBXoRnwvegN20qfhAmLAjahYudAqPDR3Ku/mn02LfGu7dN4J91CrYpdSpH4kJBXoRHwstaQDOaMhgsPccUbNrl/OmU0/l16276F3rNbrteYONHO16johSPxJTCvQiPuZ2A9bag+uoFgry1sJzz0HLlvDss3DLLZxYL4dZey8AvMfEWxtZ8TOJHQV6ER+L+Abs6tVw+ulwxRXQogUsXQpjxrB26yERnUf5+qpNgV7Eh4J5ea/lJgpuwP76K4wYQX77Duz+ZDXX8AxHf/MxWV8cX3i/CChfX3VpwpSIz5S04lPBDdi5c+GGG2DTJmZWv4Jb8kezk8aw+eBEqlGjih8rKcn72MrXV03q0Yv4jFtePig1FWY+sJmMV3pBz55wyCFccviH9D/wrBPkA4K9c6+SCKmp7scvzV8AUnnUoxfxGa9edU32sXHQWLj9HienM3o03HQTr9auGfY4XiUR3Hr6qjVfNalHL+Izbr3qLnzMypodYdgw6N7dKUh2221Qs2aZJlKVqviZxJwCvYjPhJYaTmYnU7mSj/kTzev/BHPmOI/U1LJPpArIyHCGaObnuwzVlCpFgV7EJ4KBu18/SKqTzw31prKOlvRjBqv/Nox6m9bA+ecX7FvqiVQSt5SjF/GB0JE27fiCp3YNoAufsv2406j56iTatmtXaP+SJlKJv6hHL1KFhC4M0qiR8/BaJCTUyJFg8n7hYW5lGSdwHP/jcqZx0m8fOmWFiyhXJUuJO+rRi1QRRce/5+YefM1rkRAArKXjpn8xjiGksIVnuJrhPMQukjFb3M9V1rVhJT6pRy8SY8FefN++3uPfwWPm6ddfw3nn8RoXspvDOJVPyOQZdpEMeAfuUq8NK3FNgV4khoreFC1JQWpl71548EFo2xbef58ll47hT3WXsJBTC/YNF7g1PDKxKNCLxFC4WaxuUlKADz6ADh3g9tvhnHMgJ4dOWbcw6ZkapQrcGh6ZOMqVozfGNACmAO0AC1wJrANmAWnARuDv1trd5WqliE+V5uZnat3tvNv8VjjjeSfX88Yb0KNHweslLuotCau8PfrxwHxrbSugPZADDAcWWGtbAAsCz0UkREnVJQGSk51HNfIZ3nAy66q14phFLzo9+dWrCwV5kXDKHOiNMfWBPwNTAay1e621PwA9gemB3aYDvcrbSBE/KSkvn5QEM2fCzp2wc8EKDnTuwoO7rqX2ie1hxQr3O6kiYZSnR38MsAOYZoxZZoyZYoypBxxurd0GEPipZeJFQpRUXXLyZMg4/2e4+Wbo1Am+/BKefx7efRdat67cxoovlCfQ1wA6ApOstScAv1KKNI0xJtMYk22Myd6xY0c5miESX7zy8sbAxq8tGbVnOwF93Di4+mpYu9apa1C0IA2FJ1hpOT/xUp5AvxXYaq1dFHg+Gyfwf2+MaQoQ+Lnd7c3W2snW2nRrbXrjxo3ddhHxJa+x7V2afuXk3S+5BBo3hoUL4amnoGFD1/1DU0DWajk/8VbmQG+t/Q7YYoxpGdjUDVgDzAX6B7b1B+aUq4UiPlM0xV6L37mr5ije29EWPvoIxo6Fzz+Hk092fX+4CVZazk/clHfUzWAgyxizEugAPAA8BJxpjFkPnBl4LuIrXimTSFIpoZOVuvIua2q05+59d1Cj13lOmubGG6FG4ZHPweMa42Rxwk2wUr0aKcZaG/NHp06drEi8mDnT2qQka52EifNISrJ24ED37TNnuhzku++szchwdjrmGGv/+99SnS/cIzU1ah9dqhgg20YQY40NN5C3kqSnp9vs7OxYN0MkIsHFOoqqXh0OHCi+vVDp3wMHnO78iBGwZ4+z4tOIEVC3bqnP5yYpSaUMEokxZom1Nr2k/VS9UqSUvFIjbkG+0P5Ll8LAgbB4MXTrBhMmQMuW7m+K4HxFpaY6+X8FeSlKtW5ESslr1Ez16u7b2x71I9xwA5x4otM1z8qCt9+OKMiHO19QcIKV6tWIFwV6kVLyKvGbmVl0u6VfrVks+rk1PPmk05tfuxYuvdR1THxpzqdl/6Q0FOhFSsmrxO/EiQe3t2A9H9T5K8/v7U3SMU1h0SIn2DdoUCHnmzHDufWqXrxEQjdjRcLIynLGpW/e7KRQSsyB//YbjB7t1IqvXdt5w8CB3nkdkXLQzViRciq6tF/Y5fzAybsPGgQbNkDv3vDYY9C0aaW1V8SLUjciHtyKj7nOPN22Dfr0gbPOcnIrb70FL76oIC9VhgK9iAevYY0F2w8cgCefZO+xrfj9pde5m7tp+ftKsrafCRSeJduokfNQ8TGJBQV6SVgllSvwGtaYksLBWjSDB/Ph751pyyru4S7+t7kOmZlOBie04FhurvNQ8TGJBQV6SUiRVH50G9bYtO4PvNXiOifIf/st1zd6iTPz5/MlfyjYJy/PGSUTbi1YFR+TyqRALwkpkvx7oWGNWAY3eoEva7XiuHefgsGDYe1aJub+Ayg+Jt5rlmwoFR+TyqJALwmpxPx7QEYGbHzrf+R3O5PHd2ZQ97gUJ20zfjzUr1/qWbKhSprxKlJRFOjF99xy8V5B1tqQfP2ePXDnnfDHP0J2Nosvn8gx3y+kWnrHgn0inyVbfJ9Royrk44mULJISl9F+qEyxlNfMmU55XmOcn8HSwKUpKRz66Fn7v/anJsc6TzIy7Ownt3mWIA537uD25GTnUXQfkfIgwjLFMQ/yVoFeyskrmAcDrVfNdrfXj2SrncUl1oLNoaXt3WRBiccRiZVIA71KIEjc86rXnprq5Nzd/okbA/n5zu/VqkE1u5/rmMD93EEN9nM/dzCGoeylNklJ3iNoQo8jUtkiLYGgHL3EvXA3VsOOhQ84//BFfM6JjOdGPuY02rGKBxjJXmoDTpD3urmqG6oSDxToJe6FC+ZeN0tHjQJ274YBA3j9+1NoYnZwEbM5l3l8xbHFjnXgQJjjiFRxCvQS98IFc9eSwk9bMvJnOAt/TJmCufFGPp6cw5LUi3AbEw8HSxEXLU2sEsESFyJJ5Ef7oZuxUl5eI1+KWbPG2jPOcO6kdu5s7fLlxY4T8QLfIjFGhDdj1aMXX8jIcBbhyM/3WIwjLw9uvx3at4fly+Gpp+CTT5znRY6jnrv4jerRi//95z9w/fXON0D//vDww9CkiefuGRkK7OIv6tGLf23ZAhdeCH/7G9StC++/D889FzbIi/iRAr34z759LMl4lF9TW5P3+nxGN3iQF4cth9NPj3XLRGJCgV6qpJJqxXv69FN2H9uJTi8M5T17Bm1Yw/AfhnP1oFqq/y4JS4FeqpxwteI9vwByc+Gaa6BLF/Z8u5tevM55/JtNpAGq/y6JTYFeqhyvWvFDhrh8AVxjWXjtc9CqFUybBkOH0vJADnPoRdEx8Zs3l+MvBZE4pkAvMeUWeL1KGuTmFv4CaMsq5u/5M6dMvsKZ/LRsGTzyCMmph7i+v2HDkleVEvGjcgd6Y0x1Y8wyY8wbgedHG2MWGWPWG2NmGWNqlb+Z4kdeKZqGDcO/L4lfeYhhLOME2rCGq5kCH37o1I3He6YslLyqlIgfVUSPfgiQE/J8NDDWWtsC2A1cVQHnEB/yStGAe6BOTobzmMsa2jCMh3mey2jJOt5Jvcr5kyDAa9LTrl3u7dCSfuJ35Qr0xpijgB7AlMBzA3QFZgd2mQ70Ks85xL+8AuyuXcUD9cwHNrMsrRdz6cnPHMppfMTVTGVPUiPXwmJuM2UjqWQp4kfl7dGPA24DghW5k4EfrLX7A8+3As3c3miMyTTGZBtjsnfs2FHOZkg8Crec38iRTgom//d9bBz0MBfc3prmOW+ztPfD9EpZxqfmtFKXJwhbyVLEx8oc6I0xfwO2W2uXhG522dV1ZRNr7WRrbbq1Nr1x48ZlbYbEMbfAG7RpE0y76mN+OOYEGDYMzjwT1qyh44u3smFTTfLznfePHBn5CBrVsZGEFUnlM7cH8CBOj30j8B2QB2QBO4EagX1OAd4s6ViqXulPkVSUdFumL5kddipXWAt2S/UUa+fMcX2fqkxKoqMy14wFzgDeCPz+CtA78PtTwKCS3q9A7z+lDcTGWGs4YK9kit1JQ7uXGvZBhtl6/OK6v9ZwFYltmeJhwM3GmA04OfupUTiHVHFeI2q8hjKedcRKPuJPTOVqVtOWDixnBA/RKLVeqcbaawSNSHFaHFyiolq1khflBuCXX+Duu8kfO47c/MO4lUeYTn/AkJTkVBWePr3wl0ZSklOMMje3+PFTU51RNiKJQIuDS0wEe99e/YeCkTbWwuuvQ+vW8OijVLvqSt6ftJb3Uy/HGFNwo3TevNKNtdcIGpHiFOglYl51YoLbjYF+/ZwRM24KAvHGjXD++U6t+IYNnZWeJk/mkgHJxca+l2asvUbQiLhToJeIeJUrGDTo4Hbw7smnpsKUiXvJ2Pwg+1u24df/vMdQxnDsD0vI+vpUz/OGm+RU4vKBIgJoKUGJkNfN1cmT4cCB8O81BjY+977zrZCTw7+rX8RgO45vOAo2O18U4B6oR41yXi+ao1eKRiRy6tFLRLxSKCUF+cZs55Wk/vCXv8Bvv3F5k/9w4YHZTpAPCDcaR5OcRMpPgV4K8crDe6VQqld3327IJ5OnWUsrev32Itx+O6xaxfM7znXdP9ywSKVoRMpHgV4KhFvZyatOTGZm8e0dWM6nnMrTDGBv6/ZU/2JFwQFUWEyk8inQS4Fwk5y8UigTJx7cfig/M+XQm1hiOtG5ydcwYwZHrH7XGUIZoMJiIpVPE6akQMSTnIqyFl591Vnrb9s2uPZaeOABOOywgl2yspwvjM2bDy4ssmuX05MfNUrpGJGyiHTClEbdSIGUFPcx8GHTKl9+CddfD/PnQ4cO8NprcPLJhXYJpoSCfy3k5jq9+BkzFOBFKoNSN1KgVGmV33+H++6Ddu2cCU/jxsHnnxcL8lD6ujciUrEU6KVAxEMZ330Xjj8e7rzTmeGak+OkbWq4/4GoAmQisaVAL4WEDmUsurDHqxO/h759oVs3ZwD9/PkwaxY0c11ErIBG2ojElgK9uAodamnsAc7dNJFu17XkwKxX4J//hC++gL/+NaJjaaSNSGwp0IurYF79BJaykFOYyHVkk86ZTVbCvfc6dYIjpNmtIrGlQJ+gvGbABv2w6UfGcwOfcyIpbOZSsjiTt3l/W8synU+zW0ViR8MrfS50/HpwzDoUHu4YnAELkHGphZdfZl31m2h84DsmMog7uJ8faQAory4Sj9Sj9zGvkgZDhrgPd3zmtvVO3r13b2o0b8rptRcxmCcLgnxoXr2kvwhEpOpQoPcxr/HrRZfgq81v3Mk9zP/2j7BoETzxBMkbFjNg6omuefVwNXFEpOpRCQQf8yppEKo7bzORQbRgA3OTenP+hsegadOw70lLc59Bq/VaRSqX1oz1qdKkTLzy6cnJcEydb3mBPrzNWVgMZ/I2PfNeJO2UpiX2zDUBSiS+6GZsHClaMyb0JioULxqWm+ukXUJ79YfUPcD8cyfQ/pU7yGcvd3E3DzOM36hT7JheI2PKVBNHRGJGqZs44pUySU6GPXuK5+ODgsH+vCM+Z1qdASRvXApnnQUTJpDW/Q+lTsMU/cIB50atxsaLVC6lbnzIKzWSm+sd5AHq2x+Yfsh1zPn+ZPZv2cY/mEXa2vlkLfpDmdIwmgAlEl/Uo48jXj16b5ZLeYFHuYXG7OCpGoMZsf9efqY+4PTC69YtPgoHdGNVJB6oR+9DXjVjkpOL73sc63iH7mTRl02k0rna51y/f1xBkIeDfwWoDo2IvynQxxGvlMn48QeDdR32cC//ZCXH05GlDGAS3et+SnZ+R9dj7tqlNIyI3yl14xNZWfDWzfO5c/t1HMtXvFyrLzfsHUOd1MMLyg1r7LuIv0R9KUFjTHPgeeAIIB+YbK0db4xpCMwC0oCNwN+ttbvLeh6JwDffkPGvG8nYPhtatoSJC/h71678vchubiNllKIR8b/ypG72A7dYa1sDnYHrjDFtgOHAAmttC2BB4LlEwQvP7+e+hmP5+ahW/PbqGyy/5H5YsQK6di22r0bKiCSuMgd6a+02a+3SwO8/AzlAM6AnMD2w23SgV3kbKcXNv+cz2l2ezj9338xH/Ik2djVd/jOSrNm1Pd+jUsEiialCbsYaY9KAE4BFwOHW2m3gfBkATSriHIkgovIGu3fDgAGcdfepNLQ7uZBX6cF/+JpjtOC2iLgqd6A3xhwCvArcaK39qRTvyzTGZBtjsnfs2FHeZsQ9t4qQ/fo5aZZGjaBRsqW/eZ6djVqSP/kZHmcIrcnhdS4ETMFxVG9GRIoqV6A3xtTECfJZ1trXApu/N8Y0DbzeFNju9l5r7WRrbbq1Nr1x48blaYYvuJUUDg6Iapybwyu7ujKd/qzPP5YutZdwf/JYfuHQYsdRvRkRKarMgd4YY4CpQI619rGQl+YC/QO/9wfmlL15/uOVnnHridclj1Hczgra054VZPI0XfiEz37rAGiik4hEpjw9+i5AP6CrMWZ54HEu8BBwpjFmPXBm4LkQfsGOoj3xHrzBatpyOw/yApfSirU8QyY2cMk00UlEIqUJU5Uo3IIdo0Y5Qb9h3hbGM4QLeZ01tGYgk/iQ013fo4lOIolNtW6qIK8bpZs2wRV99zGUMeTQmrOZzwgepAPLXYO8UjQiUhoK9JXI60bpqXxCNp24J+9WPqz2F94au4Z2M4dzZGotjHGKliUnK0UjImWjFaYqUTA9Exxd05BcRjOMq5nKZprTi9eZk9+T1HFGE5pEpMKoR1+BSprwFCxDkJaSzxVMYx0tuZzneIShtGFWjyX7AAAJyUlEQVQNc+gFGI2FF5EKpR59BYl0PdfuR6xi0aEDacLHfF67C1f+PolV/LHQsTQWXkQqkgJ9BXGb8JSXB0OGOOu5kvcrD3EPN20by4/b/o+F10zlqz9dzlcDqoEqSopIFCl1U0HCrefaPW8Oa2jDbTzC81xGK9bS560ryehXTWPhRSTq1KOvICkpxcfIp7CJx7mBnszlC9pxGh/xCacBsCvwxZCRocAuItGlHn0FCV3PtSZ7uY3R5NCa7rzDrTxMR5YWBHlQHl5EKo969BUk2Cufc8uH3PX9QNqyhi2depHdbzwTb09hv/LwIhIj6tFXlB07yHjnCl7+/nTapv4Kc+fSPPt1LhiSojy8iMSUevTllZ8Pzz4Lt90GP/8Mw4fDHXdAvXoFuygPLyKxpEBfHitXwoABsHAh/OlPMGkStG0b61aJiBSi1E1Z/PILDB0KHTvC+vXw3HPwwQeFgnxEywKKiFQCBfrSsBZee41fU1vDo48y5cAVdKizlqwa/Z0EfEC4uvMiIpUtYQN9JD3u0H1Oa/Y133Q8Dy66iK92N+QUPuUanmHF1uRiQdxrlqwW7haRWEjIQB9Jjzu4z7eb9jLcPsDb37ah/vIPuJlHOcEu4TNOKdg3Lw/69j34heE1S1bFykQkFhJyhalwKz0FV21KS4O0Te8ziYG0Zi2zuYgbGcc3HBX22ElJULeuU/og3PFFRMpLK0yFEa7HnZUFnZpv595Nl/E+f6E2v9ODN7iE2SUGeTiYstHC3SJSVfgu0IfLvQdf8/ojJvmwfD674mne2dqS3rzE/YykHauYR49StUELd4tIVeKr1E3RmvAANWtC/fpOKsUY7yB/Sp1lPL5/IOn7F/EeZzCQSayjVZnaoRSNiFSGhEzduI122bfvYL7cLcgfyk9MqnMjH/2WTsr+r+jLDLrybrEgHzJ6EnBSMQMHKkUjIlWfrwJ96Ua1WC7mFXJoTeZvjzOZTFqyjiz6AoWjemoqzJhRPBUzcaJSNCJS9fkqdeM1mqaoY/iSJ7mec5jPUk5gIJNYzMmu+yYlKXiLSNWUkKmb0JrwbmrxO3dwH6tpSxc+YWjN8ZzEYs8grx66iPiBrwJ9RkbhVEpyMtSq5bzWlQWs5Hju407m0JPuzdZywrQbOCrVva5b8IaqgryIxDtfBPrQIZUjRzo9+/x82LkTXnjsO15PymAB3ald4wALhr3JP+wsFm89kowM978CdENVRPwk7ssUFx1SGSxnYPIPcOlPT3HRyJGwfw/ceSdpw4eTVrduofcHe+wjRzo3c1NSnCCvnryI+EXc34x1uwHbkSU8W2sA7fdmQ/fuMGECHHdc+RsqIlKFxPRmrDHmbGPMOmPMBmPM8GicIyh0SGV9fuRxBrOYk2iydyu88AK89ZaCvIgktAoP9MaY6sAE4BygDdDHGNOmos8TlJICYPkHL7GWVlzHBCYyiLOar4U+fYrPdBIRSTDR6NGfBGyw1n5lrd0LvAT0jMJ5AHh88HoWVDuLl+jDNzTjJBYzPOkJhj/4f9E6pYhIXIlGoG8GbAl5vjWwreJNm8b5t7fjtNqL+WfDJ+nMInampmvsu4hIiGiMunHLlRS742uMyQQyAVKc/EvptW8PF19MrTFjuK9pU+4r21FERHwtGj36rUDzkOdHAd8W3claO9lam26tTW/cuHHZztSxozO+smnTsr1fRCQBRCPQfw60MMYcbYypBfQG5kbhPCIiEoEKT91Ya/cbY64H3gSqA89aa1dX9HlERCQyUZkZa62dB8yLxrFFRKR04rbWTbglA0VE5KC4rHXjVd8GNKxSRKSouOzRuy0ZmJfnbBcRkcLiMtB7LRlYuqUERUQSQ1wGeq/5VWWddyUi4mdxGei1WIiISOTiMtAXXTJQa7uKiHiLy1E34AR1BXYRkZLFZY9eREQip0AvIuJzCvQiIj6nQC8i4nMK9CIiPmesLbb4U+U3wpgdwKYyvr0RsLMCmxMvEvFzJ+JnhsT83In4maH0nzvVWlviyk1VItCXhzEm21qbHut2VLZE/NyJ+JkhMT93In5miN7nVupGRMTnFOhFRHzOD4F+cqwbECOJ+LkT8TNDYn7uRPzMEKXPHfc5ehERCc8PPXoREQkjrgO9MeZsY8w6Y8wGY8zwWLcnGowxzY0x7xljcowxq40xQwLbGxpj3jbGrA/8PCzWba1oxpjqxphlxpg3As+PNsYsCnzmWcaYWrFuY0UzxjQwxsw2xqwNXPNTEuRa3xT4973KGPOiMaaO3663MeZZY8x2Y8yqkG2u19Y4Hg/EtpXGmI7lOXfcBnpjTHVgAnAO0AboY4xpE9tWRcV+4BZrbWugM3Bd4HMOBxZYa1sACwLP/WYIkBPyfDQwNvCZdwNXxaRV0TUemG+tbQW0x/n8vr7WxphmwA1AurW2HVAd6I3/rvdzwNlFtnld23OAFoFHJjCpPCeO20APnARssNZ+Za3dC7wE9IxxmyqctXabtXZp4Pefcf7jN8P5rNMDu00HesWmhdFhjDkK6AFMCTw3QFdgdmAXP37m+sCfgakA1tq91tof8Pm1DqgB1DXG1ACSgG347Hpbaz8EdhXZ7HVtewLPW8dnQANjTNOynjueA30zYEvI862Bbb5ljEkDTgAWAYdba7eB82UANIldy6JiHHAbkB94ngz8YK3dH3jux+t9DLADmBZIWU0xxtTD59faWvsNMAbYjBPgfwSW4P/rDd7XtkLjWzwHeuOyzbdDiIwxhwCvAjdaa3+KdXuiyRjzN2C7tXZJ6GaXXf12vWsAHYFJ1toTgF/xWZrGTSAv3RM4GjgSqIeTuijKb9c7nAr99x7PgX4r0Dzk+VHAtzFqS1QZY2riBPksa+1rgc3fB/+UC/zcHqv2RUEX4HxjzEaclFxXnB5+g8Cf9uDP670V2GqtXRR4Phsn8Pv5WgN0B7621u6w1u4DXgNOxf/XG7yvbYXGt3gO9J8DLQJ35mvh3LyZG+M2VbhAbnoqkGOtfSzkpblA/8Dv/YE5ld22aLHWjrDWHmWtTcO5ru9aazOA94CLA7v56jMDWGu/A7YYY1oGNnUD1uDjax2wGehsjEkK/HsPfm5fX+8Ar2s7F7gsMPqmM/BjMMVTJtbauH0A5wL/A74ERsa6PVH6jKfh/Mm2ElgeeJyLk7NeAKwP/GwY67ZG6fOfAbwR+P0YYDGwAXgFqB3r9kXh83YAsgPX+1/AYYlwrYF7gLXAKmAGUNtv1xt4EecexD6cHvtVXtcWJ3UzIRDbvsAZkVTmc2tmrIiIz8Vz6kZERCKgQC8i4nMK9CIiPqdALyLicwr0IiI+p0AvIuJzCvQiIj6nQC8i4nP/D0AaLsj+WYXQAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "\n",
    "N = 100                                              # setting number of points\n",
    "data_x = np.arange(N)                                # Generate an array with values from 0 to N\n",
    "rdm = (np.random.rand(N)-0.5)                        # rand(N) returns N random numbers between 0 and 1\n",
    "data_y1 = data_x + rdm*10                            # Linear wrt x, with noise\n",
    "plt.scatter(data_x, data_y1, color='blue')           # Scatter plot; color parameter is optional\n",
    "plt.plot(data_x,data_x, \"r-\")                        # Line plot to show 'true' function without noise in red\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Reading Datasets\n",
    "In order to perform machine learning, we typically need a significant amount of data. By understanding the data, analysing patterns and training our algorithms, we can achieve meaningful results. Scikit-learn makes it easy for us to access some pre-defined 'toy' datasets to practice our understanding.\n",
    "\n",
    "In this example, we'll use the \"diabetes\" dataset, which contains records for 442 diabetes patients. The 10 features in the dataset represent each patient's age, sex, body mass index, average blood pressure, and six blood serum measurements. The response of interest is a quantitative measure of disease progression one year after baseline. We'll use this to find a regression to predict a patient's disease progression based on any of their features.\n",
    "\n",
    "Read through the code below to understand how this particular data is structured."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAV0AAAD8CAYAAADUv3dIAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4xLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvDW2N/gAAHh1JREFUeJzt3XucXVV5//HPl8kAkgRCDBpIooFwURQUCAhCARUK2haqFBC1fXmNl1JQqr9KgVTxUi9tVRSBIBCvWKWIaYmCVm5FSROBAOFm5JYEXiKQQLhnZp7fH3sPHKZzztnnzFln9tn5vvPar5zLPs9eZ2bOM2vWXns9igjMzKw7NhnvBpiZbUycdM3MushJ18ysi5x0zcy6yEnXzKyLnHTNzLrISdfMrA5J50t6UNItdZ6XpDMkrZR0k6Q9m8V00jUzq28hcHiD598M7JRv84CzmgV00jUzqyMirgYeabDLkcB3InMdMEXSto1iTuhkA0ez4aG7kl7ytv/u70kZnn71JY0PcHD/9KTx746nksZfO/R00vgA1639XdL4e07ZIWn8mRMmJ40PMDnxx3n5hoeSxge4Zs1/a6wxWsk5m24z54NkPdRhCyJiQQuHmwGsqrm/On/sgXovSJ50zczKKk+wrSTZMXPSNbNqGRrs5tHWALNq7s/MH6vLY7pmVi2DA8W3sVsE/E0+i2Ff4NGIqDu0AO7pmlnFRAx1LJakC4GDgWmSVgP/BPRnx4mzgcXAW4CVwJNA05NMTrpmVi1DnUu6EXFck+cD+NtWYjrpmlm1dLCnm4KTrplVS3dPpLXMSdfMqqVKPV1JW0TEk6kaY2Y2VtGZWQnJFJoyJun1km4Fbs/vv0bSNxvsP0/SMknLvvWdCzvUVDOzAoaGim/joGhP9yvAYWRz0oiI5ZIOrLdz7VUeqS8DNjN7gaoML0TEKukFl0WXe7TazDZOFTmRtkrS64GQ1A+cCNyWrllmZm2qSE/3Q8DXyFbPWQNcTosTgs3MuqLkJ9IKJd2IeAh4Z+K2mJmN3TidICuqUNKVdMYoDz8KLIuIn3a2SWZm7Yso95hu0VXGNgdeC/wu33YnW8LsfZK+mqhtZmati6Hi2zgoOqa7O7B/5L9CJJ0FXAMcANycqG1mZq2rwvACsDUwiWxIAWAiMDUiBiU9k6RlZmbtqMjshS8BN0q6EhBwIPB5SROBXzZ6YeoaZtfedEHS+DPnvCVpfICXTpmUNP6iB36bNP4uW89MGh9g/tR9k8Z/cJO0H9RLn747aXyAl/RvmTT+kofuSBq/YwY3jHcLGio6e+E8ST8D/ppsfu7lwOqIeAL4RML2mZm1pgrDC5LeT3ZBxEzgRmBf4DfAG9M1zcysDSUfXig6e+FEYG/g3oh4A7AHsC5Zq8zM2lWRBW+ejoinJSFps4i4XdIuSVtmZtaOKgwvAKslTQEuAX4haS1wb7pmmZm1JypyIu2t+c1PSboC2Ar4ebJWmZm1q+Rjui2X64mIq1I0xMysIyoyvGBm1htK3tMtWq7nM5Im1NzfUlLaqxLMzNpR8tkLRaeMTQCWSNpd0qHAUqDuZU61NdIefPKBTrTTzKyYKix4ExEnS/olsARYCxwYESsb7P9cjbR9tjvINdLMrHsGyr2IedHhhQOBM4DTgSuBr0vaLmG7zMzaU4WeLvAvwNERcSuApLcBvwJekaphZmZtqcjshf2iZjn2iLhYkqeOmVn5VGH2AjBN0nmSfg4gaVfgL9M1y8ysTRWZvbAQuAzYNr9/J/DRFA0yMxuTko/pFu7pRsSPgCGAiBgAyl39zcw2TgMDxbdxUHRM9wlJLwYCQNK+PF+6x8ysPKLcs1SLJt2TgEXAHEnXAtsAf1Xkhf3qa7NpxaQup7P694uTxgc4d4/5SeP3bTs3afz+wn8wtS91Ib6dN6R9Dzv3zWGrwbR/zq5J/H04f7ceWc21IrMX5gBvBmYBRwGva+G1Zhu91AnXapQ86Rb91XhaRDxGVhX4DcA3gbOStcrMrF0VOZE2fNLsz4BzI+JSYNM0TTIzG4PBweJbE5IOl3SHpJWSPjnK8y+TdIWkGyTdJKnpeGfRpLtG0jnAscBiSZu18Fozs+7p0DxdSX3AmWRDq7sCx+XXKNQ6FfhRROwBvJ1sFKChoonzGLJ5uodFxDpgKi69bmZl1LmLI/YBVkbEXRHxLPBD4MgR+wSwZX57K+D+ZkGLrjL2JHBxzf0HAK/ZaGbl08JYraR5wLyahxbkqyQCzABW1Ty3mmwSQa1PAZdL+jtgInBIs2N6BoKZVUoMFZ+nW7sMbZuOAxZGxL9K2g/4rqRXR9TP/E66ZlYtnZsytoZsmuywmfljtd4HHA4QEb+RtDkwDXiwXlCfDDOzaunc7IWlwE6Stpe0KdmJskUj9rkPeBOApFcCmwN/bBS0YU9X0o8i4hhJN5NfAjz8FBARsXuzVpuZdVWHeroRMSDpeLJJBH3A+RGxQtLpwLKIWAT8PXCupI+R5ch3RzS+DrnZ8MKJ+f9/3kpjawend9hqF6ZPdJEJM+uSDl6RFhGLgcUjHptfc/tWYP9WYjZMuvksBSLiXsiqADd7Tb7/c4PT+894Y7lXnzCzaqnCgjeSPgh8Gnia54cZAtghUbvMzNpT8rUXis5e+Djw6oh4KGVjzMzGrIUpY+OhaNL9PfBkyoaYmXVEgTUVxlPRpHsy8GtJS6hZ2jQiTkjSKjOzNkVFhhfOISu5fjN5yR4zs1KqyPBCf0SclLQlZmadUPIS7EWT7s/yubf/yQuHFx5J0iozs3ZVpKd7HNkUsZGL+DadMnZw//RW29SSl06ZlDR+6vplAB+44fSk8X+510eTxr9/IH2N0rtZlzT+9hOmJI3PBDhgaGLSQyhpdFiyJu1nGeBlnQgyUI0TabsCHwEOIEu+1wBnp2qUWdWkTrhWoyLDC98GHgPOyO+/I3/smBSNMjNrW0WGF14dEbVlKq6QdGuKBpmZjUXZp4wVXdrxekn7Dt+R9DpgWZommZmNwVAU38ZBs6Udh5d07Ce7OOK+/P7LgdvTN8/MrEU9PrzQ0pKOZmbjrpcvAx5e0tHMrFe0UiNtPLhGmplVi5OumVkXVWH2gqQdJP2npIckPSjpp5K8gLmZlU/JZy8UnTL2A+BHwHRgO+DHwIX1dpY0T9IySctuWL9y7K00MyuqIkl3i4j4bkQM5Nv3yEoNjyoiFkTE3IiYu8fkHTvTUjOzAmJwqPA2HlpZZexkst5tAMcCiyVNBa82ZmYlUpETacNrLHwg/394QaO34wKVZlYiVZkyNtoqY2dFxNOpGmZm1paKJN3RVhn7Dl5lzMzKptwzxrzKmJlVSwyUO+t6lTEzq5ahFrZx4FXGzKxSev1E2phXGbs7nhpriIYWPfDbpPH7tp2bND6kr2H277/9atL4g7ddmzQ+wJRDT0ka/9ztdk8a/6mBJ5LGB7iov+7U+Y5YxNqk8QGO7kSQco8ueJUxM6uWXu/pmpn1ll7u6ZqZ9ZoYGO8WNOaka2aVUvIK7E66ZlYxTrpmZt3jnq6ZWReVPekWvSLNzKwnxKAKb81IOlzSHZJWSvpknX2OkXSrpBWSftAspnu6ZlYpnerpSuoDzgQOBVYDSyUtiohba/bZCTgZ2D8i1kp6SbO4bfd0JS1o8Nxz5XpWPn5Pu4cwM2tZDKnw1sQ+wMqIuCsingV+CBw5Yp8PAGdGxFqAiHiwWdCGSVfS1Drbi4G31H3TNeV6dpw0u1kbzMw6JoaKb03MAFbV3F+dP1ZrZ2BnSddKuk7S4c2CNhte+CNwL89XioBswRsBTbvRZmbdFtF8rHaYpHnAvJqHFkRE3b/iRzEB2Ak4GJgJXC1pt4hY1+gFjdwFvCki7hulsatG2d/MbFy1MqabJ9h6SXYNMKvm/sz8sVqrgSURsQG4W9KdZEl4ab1jNhvT/SqwdZ3nvtTktWZmXTc0qMJbE0uBnSRtL2lTspqQi0bscwlZLxdJ08iGG+5qFLRh0o2IMyNiuaSjJU3OA58m6WIg/Xp+ZmYt6tSJtIgYAI4HLgNuA34UESsknS7piHy3y4CH80o6VwCfiIiHG8UtOmXstIj4saQDgDcBXwbOAl5X8PVmZl1RYFZC8VgRi4HFIx6bX3M7gJPyrZCiU8YG8///jGyg+VJg06IHMTPrloji23go2tNdI+kcsknCX5S0Gb6azcxKqJM93RSKJs5jyMYuDsunQkwFPpGsVWZmbYpQ4W08FOrpRsSTwMU19x8AHijy2rVDT7fXsoJ22Xpm0vj9XejQ3z/waNL4qWuY9b1y/6TxAfadtnPS+DP3TlvD7LNLpieND7CetKt3rxtMW++wUwYLrKkwnrz2gplVynj1YIty0jWzSin7mK6TrplVynjNSijKSdfMKsU9XTOzLhocKvdsViddM6sUDy+YmXXRkGcvmJl1j6eMmZl1UdmHF5qV6+mT9EFJn5G0/4jnTm3wuudqpK163Gudm1n3DIUKb+Oh2Wm+c4CDgIeBMyT9W81zb6v3otoaabMmzaq3m5lZxw0ObVJ4Gw/NjrpPRLwjIr5KtnbuJEkX56uMlXvgxMw2StHCNh6aJd3n1syNiIGImAcsB34FTErZMDOzdvT68MKykSWFI+LTwPnA7FSNMjNrV08v7RgR7wKQdDTw84hYn59A2xOX6jGzEmqhGPC4KDqSfFqecA8ADgHOI6uRZmZWKoEKb+PBNdLMrFIGQoW38VA06Q7XSDsWWOwaaWZWVmXv6Ra9Iu0Y4HDgXyJinaRtKVgj7bq1v2u3bYXMn7pv0vgEPJP4e3M365LGn3LoKUnjQ/pyOr+4cUHS+M+e8Y9J4396l+C7F6S9AHRJ37NJ47+7rzfm3Jd9TDd5jbRelzrhVkHqhFsFqROuPW+8erBF+SfBzCqlEj1dM7NeMeierplZ95S8Wo+TrplVy5B7umZm3VPy5XSddM2sWnwizcysi4bk4QUzs64ZbL7LuGqYdCVtARxPNkzydeDtZBUjbgdOj4jHk7fQzKwFZZ+90Gz9hIXAS4HtgUuBucCXyapG1F1lrLZG2jMbHutQU83MmhtChbfx0Gx4YeeIOEaSyC77PSQiQtL/kFWQGFVELAAWAGw9aceyn0w0swope8IpuvZCSFockRU3zu+X/b2Z2Uao14cXlkmaBBAR7x1+UNIcYH3KhpmZtWOoha0ZSYdLukPSSkmfbLDfUZJC0txmMZuV63l/HnBkuZ69gI8WaLOZWVcNdqinK6kPOBM4FFgNLJW0KCJuHbHfZOBEYEmRuO2W6/lW3hgzs1LpYE93H2BlRNwVEc8CPwSOHGW/zwBfBJ4u0j6X6zGzSulg0p0BrKq5vzp/7DmS9gRm5TmxEJfrMbNKCRXfaqe35tu8oseRtAnwb8Dft9K+5OV6zMy6qZW1F2qnt45iDVBbo2hm/tiwycCrgSuzWbVMBxZJOiIiltU7ZvJyPXtO2aHIbm17cJO0y1vsvCF9h377CVOSxj93u92Txp+59xNJ40P6GmabnvD5pPHfdwI8cvR7kh5j8zvT1jDr75FJoh28DHgpsJOk7cmS7duBdww/GRGPAtOG70u6Evh4o4QLXnvBrCtSJ1x7Xqfm6UbEgKTjgcuAPuD8iFgh6XRgWUQsaieuk66ZVUon//aNiMXA4hGPza+z78FFYjrpmlmleD1dM7MuKvvQs5OumVVK2ddecNI1s0rp6UXMzcx6zVDJBxicdM2sUnwizcysi8rdz21j/QRJd6ZoiJlZJ3RyPd0UmhWmXM/zvziGzwluMfx4RGxZ53XzgHkAu0x5JTMmzuxQc83MGhsoeVGbZj3dC4BLgJ0iYnJETAbuy2+PmnAhW0QiIuZGxFwnXDPrpmhhGw/NKkecIGkv4EJJlwDfoPxDJma2ESv7ibSmY7oR8VuyahEAVwGbJ22RmdkYDBGFt/FQ9ETaUcBCsnV1z5Z0cb5iuplZqZR9eKGVGmmPAXOANwDnAWcla5WZWZvKPnuhnRpp57pGmpmV1SBReBsPrpFmZpVSlZ7uMWSrpx8WEeuAqbhGmpmVULTwbzwkr5E2c8Lk9lpW0KVP3500/t5snzQ+wAGbTEwa/6mBtDXMPrtketL4ADv+ui9p/Ldek7acztQfX5A0PsA2rzolafxdXvxI0vidUvYpY157wcwqxauMmZl1UblTrpOumVXMQMnTrpOumVXKeJ0gK8pJ18wqxSfSzMy6yD1dM7Muck/XzKyLBsM9XTOzrin7PN2GlwFL2r3mdr+kUyUtkvR5SVs0eN08ScskLbtzfdorxszMapX9MuBmay8srLn9BWBH4F+BFwFn13tRbbmenSenv4zWzGxY2Re8aTa8oJrbbwL2jogNkq4GlqdrlplZe8o+vNAs6W4l6a1kPeLNImIDZGWApZKX3DSzjVKvTxm7Cjgiv32dpJdGxB8kTQceSts0M7PW9fTshYh4D4Cko4HLIuIxSacCe+L1dM2shMo+vNBSjTRJB5BVBnaNNDMrpbKfSGunRtoC10gzs7Iq+5SxohdHDNdIOxT4omukmVlZVWV4wTXSzKwnREThbTwkr5E2OfGVxi/p3zJp/DVd6NCr+S5jclH/5knjr2cgaXyAJX3PJo2/+Z2zksZPXb8M4E9XfC5p/G/uOT9pfIATOxCjk6XVJR0OfA3oA74VEV8Y8fxJwPuBAeCPwHsj4t5GMT1EYGaVMkQU3hqR1AecCbwZ2BU4TtKuI3a7AZgbEbsDFwFfatY+J10zq5QODi/sA6yMiLsi4lngh8CRI451RT4SAHAdMLNZUCddM6uUVnq6tYtz5du8mlAzgFU191fnj9XzPuBnzdrnpR3NrFJamQoWEQuABWM9pqR3AXOBg5rt66RrZpXSwcuA1wC1Z1hn5o+9gKRDgFOAgyLimWZBnXTNrFI6OE93KbCTpO3Jku3bgXfU7iBpD+Ac4PCIeLBIUCddM6uUTiXdiBiQdDzZNQp9wPkRsULS6cCyiFgEfBmYBPxYEsB9EXFE3aA46ZpZxXTyooeIWAwsHvHY/Jrbh7Qa00nXzCqlpy8DlnS8pGn57R0lXS1pnaQlknZr8LrnpmGsWP/7TrfZzKyusi9402ye7ocjYnix8q8BX4mIKcA/ULBG2qsmz+lQU83MmhuMocLbeGg2vFD7/Esi4icAEXGlpMnpmmVm1p7xWsimqGY93YskLZS0A/ATSR+V9HJJ7wHu60L7zMxa0qm1F1JpVq7nFEnvBi4E5gCbAfOAS4B3Jm+dmVmLer0wJRGxUNKTwM/zkj2nAXsAO5CtsGNmVhpDPT68MOzUmhppbySrkVb3RJqZ2Xjp9dkLw2prpJ3rGmlmVla9PnthmGukmVlPqMrwgmukmVlPKPvwQvIaacs3PNR8pzFY8tAdSeOfv9suSeMDLFkzPWn8RaxNGn/d4FNJ4wO8uy9tDbP+xJ+/XV78SNoDkL6G2UeuPz1p/E4pe0/Xay+YWaX0/JQxM7NeMhiDzXcaR066ZlYpZb8M2EnXzCql7Es7OumaWaW4p2tm1kWevWBm1kVln73Q9OIISfsXeczMrAzKfhlwkSvSvl7wMTOzcRcRhbfxUHd4QdJ+wOuBbSSdVPPUlmTliOuSNI9s3V123GoXpk+c0YGmmpk1V/Yx3UY93U3J6rlPACbXbI8Bf9UoaG2NNCdcM+umnu3pRsRVwFWSFkbEvZK2yNdgMDMrrbLP0y0yprudpFuB2wEkvUbSN9M2y8ysPWXv6RZJul8FDgMeBoiI5cCBKRtlZtauss9eKLq04ypJtQ+Ve0UJM9tolf1EWpGku0rS64GQ1A+cCNyWtllmZu0p+2XARYYXPgT8LTADWAO8Nr9vZlY6PV85IiIeAt7ZhbaYmY1Z2Xu6TZOupDNGefhRYFlE/LTzTTIza1/Zx3TV7LeCpAXAK4Af5w8dBdwNvBi4KyI+2tEGSfMiYkEnY3b7GL0evxvH6PX43TiG30M1FUm61wH7R2Q1MCRNAK4BDgBujohdO9ogaVlEzO1kzG4fo9fjd+MYvR6/G8fwe6imIifStia7HHjYRGBqnoSfSdIqM7OKKjJl7EvAjZKuBER2YcTnJU0EfpmwbWZmldMw6Sq7IuJyYDGwT/7wP0bE/fntTyRoUzfGf1Ifo9fjd+MYvR6/G8fwe6igImO6N0fEbl1qj5lZpRUZ071e0t7JW2JmthEo0tO9HdgRuBd4gmxcNyJi9/TNMzOrliI93cOAOcAbgb8A/jz/f6MlabakW9p87XaSLurGscZC0vGSVkoKSdMSxP++pDsk3SLp/Hxdj07GP0/Sckk3SbpI0qTmr2r7WGdIejxB3IWS7pZ0Y769tsPxJelzku6UdJukE+rsN0XSRzp57DrHOThf56XSmibdiLg3Iu4FngKiZrM2RMT9EdGw8kZJXAscQvYXTgrfJ7voZjfgRcD7Oxz/YxHxmvwvsvuA4zscHwBJc8mmVabyiYh4bb7d2OHY7wZmAa+IiFcCP6yz3xSgcNLNk3mRDt1IB5OVCKu0ItWAj5D0O7Kr0K4C7gF+NpaDSrpE0m8lrcjrqSHpfflv3P+VdK6kb+SPbyPpPyQtzbemlYglTZR0ad7TuUXSsZL2knRVftzLJG0raUIe8+D8df8s6XMF38aEvLd2W96T2kLSPXmMGyUtk7RnfqzfS/pQfox2eq71jvUlSTfnX7MdW4z5nNG+XhFxQ0Tc027MAvEXRw74X2Bmh+M/lj8nsqQ+po5CnZ+pPuDLwP8bS+x68ccas0D8DwOnR2QLy0bEg3Ve/gVgTv5z/RVJ/y3p+vxn78g8/mxlf7l8B7gFmNXKZ1rSbLLFtT6WH+dPOvn+S6XAyurLyS75vSG//wbgvFZWZx8l5tT8/xeRfYNmkCXzqUA/2RVv38j3+QFwQH77ZcBtBeIfBZxbc38r4NfANvn9Y4Hz89uvIluq8hDgBmDTAvFnk32I98/vnw98PH8PH84f+wpwE1lduW2AP9S89pYWvlaNjnVK/tjfAP81hu/H//l61dy+B5g2xu93o/j9wPXAn3Q6PnAB8AfgCmCLTr8HsmVOP5bffzxB/IXAHfnP0VeAzToc/2HgFGAZWUdqpwY/g7fktycAW+a3pwEryc7zzAaGgH3z57ajxc808Cng42P5OvbCVuRPgA0R8TCwiaRNIuIKYKyX9Z0gaTlwHdmfN38NXBURj0TEBp5f5wGyZPgNSTcCi4At1Xx87mbgUElfzH9jzgJeDfwij3Mqec8qIlYA3wX+C3hvRDxb8D2siohr89vfI7ssmryNw21YEhHrI+KPwDOSphSMXfRYF9b8v1+bsWHE1ysiHh1DrFbjfxO4OiKu6XT8iHgP2Yf/NrJftGMx8mdqInA08PUxxh01fv4eTiYbgtmbLHn9Q4fjbwY8HdlluueS/UJvRmQXR91EdnHUDOCl+XP3RsR1+e196OxnujKKJN11+RfkauD7kr4GtH3SIP9T/hBgv4h4DVnv8vYmbdw3nh/XmhERDY8fEXcCe5L9oH2W7Lf8ipoYu0XEn9a8ZDdgHfCSFt7KyD9Xh+8PXxo9xAsvkx6iYKWOFo4VDfYpHnzE10vS/HZjtRJf0j+R/RVwUor4+XODZGOVR3XyGMAHyGb1rJR0D7CFpJWdii9pfkQ8EJlnyHrt+zQM0mJ8YDVwcb7LT4AiM5LeSfY92ysiXkv2l8Tm+XNPFGxOy5/pKimSdJcDTwIfA34O/J7GSbKZrYC1EfGkpFcA+5L1Gg6StLWyBXVqPyCXA383fEcFzuBK2g54MiK+Rzbm9jpgG0n75c/3S3pVfvttZL2IA4Gvt9AbfdlwPOAdwP8UfF076h3r2Jr/f9Nu8FG+Xnu2G6tofEnvJ5sZc1zE2IpVjRJ/r+Ex7nxM9wjG9jM72jH2iIjpETE7Imbnz41lXH20r9G2Ne/hL8mG4joWH7iEbLgQ4CDgzjovX082TAbZ5/fBiNgg6Q3Ay+u8Zimtf6Zrj1NdzcYfgOtHeeymdsczyP6k+RnZn3yXAFeSnbWcB/wOWAJ8G/hcPD9u9O9k41q3AmcXOMZh+f43kn3z55JVvLia7JfICrKeyjSyH7RZ+etOAL5dIP5ssg/x9/L38R/AFtSMf5KdGf5GzWvuyY83m9bHdOsd64v5+1wK7DiG78loX68TyHpCA8D9wLc6HH+A7Bf4jfk2v4Px9yGbfXEzWaL6Pvk4ZCffw4jnxzqmO9rX6Fc17+F7wKQOx58CXJof4zfAaxq8/gd5Oy7I9705v31b/jP6f36uafEzDexc08a2x/jLvtW9OELSh8mmicwhGywfNhm4NiLeNeoL2yRpUkQ8nv9W/AnZia6fdPIYVZL/STs3ssoeZqXjz/ToGo0x/oCsR/rPwCdrHl8fEY8kaMunJB1CNj50OVkv2Mx6lz/To2h6GbCZmXVOO1eNmJlZm5x0zcy6yEnXzKyLnHTNzLrISdfMrIv+PzWUFtf5BsjjAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 2 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Feature bmi has the largest correlation with target feature, the correlation is 0.586450134474689\n"
     ]
    }
   ],
   "source": [
    "from sklearn import datasets, linear_model\n",
    "\n",
    "diabetes = datasets.load_diabetes()                                 # load the diabetes dataset\n",
    "data = pd.DataFrame(diabetes.data, columns=diabetes.feature_names)  # make diabetes dataframe, specify col names\n",
    "target = pd.DataFrame(diabetes.target, columns=[\"target\"])          # target col is variable we wish to predict from data\n",
    "df = pd.concat([data,target], axis=1)                               # concatenate data and target into one dataframe\n",
    "\n",
    "corr = df.corr()                                                    # Calculate the correlation between x and y.\n",
    "corr_abs=corr.abs()                                                 # get the absolute value of correlation\n",
    "sns.heatmap(corr_abs,       \n",
    "            xticklabels=corr.columns.values,\n",
    "            yticklabels=corr.columns.values)                        # create correlation heat map\n",
    "plt.show()\n",
    "\n",
    "corr_array = np.array(corr[\"target\"])[:-1]\n",
    "corr_abs_array = np.array(corr_abs[\"target\"])[:-1]\n",
    "i = np.argmax(corr_abs_array)\n",
    "feature = corr.columns[i]\n",
    "print('Feature', feature, 'has the largest correlation with target feature, the correlation is', corr_array[i])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Conclusion and Further Resources\n",
    "We have only just scratched the surface with these very powerful modules! Try to get familiar with the features covered here, and for those of you interested in seeing more examples, we strongly advise you to look at the following resource: https://jakevdp.github.io/PythonDataScienceHandbook/\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
